Goto

Collaborating Authors

 Economy


Long-form factuality in large language models

Neural Information Processing Systems

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for longform factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality.


Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis

Neural Information Processing Systems

Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first-cut multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA.


Anthropic CEO warns AI will destroy half of all white-collar jobs

Mashable

By now, you've likely already heard that some companies want to replace human workers with AI. Now, the CEO of one of the biggest AI companies is warning that AI may be coming for your job sooner than expected. In an interview with Axios, Anthropic CEO Dario Amodei said that AI could "wipe out" as much as half of all entry-level white-collar jobs. Amodei, who runs the OpenAI competitor behind the ChatGPT rival Claude, said that the resulting job loss would cause a spike in unemployment as high as 20 percent in the next five years. Just this week, Mashable covered a new report which found that AI is already affecting the number of entry-level jobs in the tech sector and, in turn, young people who've just graduated into the workforce.


Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

Neural Information Processing Systems

Presenting users with diverse responses from foundation models is crucial for enhancing user experience and accommodating varying preferences. However, generating multiple high-quality and diverse responses without sacrificing accuracy remains a challenge, especially when using greedy sampling. In this work, we propose a novel framework, Synthesize-Partition-Adapt (SPA), that leverages the abundant synthetic data available in many domains to elicit diverse responses from foundation models. By leveraging signal provided by data attribution methods such as influence function, SPA partitions data into subsets, each targeting unique aspects of the data, and trains multiple model adaptations optimized for these subsets. Experimental results demonstrate the effectiveness of our approach in diversifying foundation model responses while maintaining high quality, showcased through the HumanEval and MBPP tasks in the code generation domain and several tasks in the natural language understanding domain, highlighting its potential to enrich user experience across various applications.


AI could erase half of entry-level white collar jobs in 5 years, CEO warns

ZDNet

Just one week after Anthropic released its most advanced AI models to date, Opus 4 and Sonnet 4, Anthropic CEO Dario Amodei warned in an interview with Axios about the future of jobs in an AI-centric world. AI could be responsible for eliminating half of all entry-level white-collar jobs -- while spiking unemployment to 10-20% -- in the next one to five years, Amodei said in the interview. Also: The best free AI courses and certificates - and I've tried many His motivation for speaking up, Amodei said, is to help people prepare adequately and encourage AI companies and the government to be candid about the change. "Most of them [workers] are unaware that this is about to happen," Amodei told Axios. "It sounds crazy, and people just don't believe it."


From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection

Neural Information Processing Systems

This paper introduces a novel approach that leverages Large Language Models (LLMs) and Generative Agents to enhance time series forecasting by reasoning across both text and time series data. With language as a medium, our method adaptively integrates social events into forecasting models, aligning news content with time series fluctuations to provide richer insights. Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning to evaluate predictions. This enables the model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. By integrating selected news events with time series data, we fine-tune a pre-trained LLM to predict sequences of digits in time series. The results demonstrate significant improvements in forecasting accuracy, suggesting a potential paradigm shift in time series forecasting through the effective utilization of unstructured news data.


Is AI making it harder for new college grads to get hired in tech?

ZDNet

Once upon a time, Silicon Valley's move-fast-and-break-things culture welcomed college grads with open arms. Tech companies enthusiastically hired younger and less experienced talent, driven by an enthusiasm for fresh ideas and a financial climate that pointed to sunny days ahead. All of that suddenly and dramatically changed with the COVID-19 pandemic. Today, the unfettered hiring mindset across tech has been replaced by a sense of caution and a prioritization of experience. At the same time, new AI tools are starting to automate many of the routine tasks that traditionally would've been handled by younger, entry-level professionals.